Pemodelan Topik Menggunakan n-Gram dan Non-negative Matrix Factorization
نویسندگان
چکیده
Pemodelan topik merupakan teknik pembelajaran mesin yang digunakan untuk melihat dalam sekumpulan dokumen teks. Pada penelitian ini pemodelan adalah Non-Negative Matrix Factorization (NMF) dengan n-gram. Preprocessing seperti penghilangan tanda baca, angka dan stopword diimplementasikan pada ini. Proses dilakukan terlebih dahulu mengubah kata terdapat artikel menjadi berhuruf kecil. Penelitian juga mengeksplorasi keefektifan penerapan unigram, bigram, trigram topik. menggunakan coherence value menentukan jumlah terbaik dapat dibentuk. Data berjumlah 53.920 berita bersumber dari portal RMOL.id BeritaSatu.com periode Juli sampai Desember 2022. Visualisasi t-SNE distribusi pembentukan Berdasarkan hasil diperoleh bahwa dibentuk unigram 15 nilai 0.812748, bigram 10 0.835738 7 0.830572. Sedangkan 0.799718, 0.788762 0.801935.
منابع مشابه
Bayesian Non-negative Matrix Factorization
We present a Bayesian treatment of non-negative matrix factorization (NMF), based on a normal likelihood and exponential priors, and derive an efficient Gibbs sampler to approximate the posterior density of the NMF factors. On a chemical brain imaging data set, we show that this improves interpretability by providing uncertainty estimates. We discuss how the Gibbs sampler can be used for model ...
متن کاملRobust non-negative matrix factorization
Non-negative matrix factorization (NMF) is a recently popularized technique for learning partsbased, linear representations of non-negative data. The traditional NMF is optimized under the Gaussian noise or Poisson noise assumption, and hence not suitable if the data are grossly corrupted. To improve the robustness of NMF, a novel algorithm named robust nonnegative matrix factorization (RNMF) i...
متن کاملNon-Negative Multiple Matrix Factorization
Non-negative Matrix Factorization (NMF) is a traditional unsupervised machine learning technique for decomposing a matrix into a set of bases and coefficients under the non-negative constraint. NMF with sparse constraints is also known for extracting reasonable components from noisy data. However, NMF tends to give undesired results in the case of highly sparse data, because the information inc...
متن کاملPruning sparse non-negative matrix n-gram language models
In this paper we present a pruning algorithm and experimental results for our recently proposed Sparse Non-negative Matrix (SNM) family of language models (LMs). We show that when trained with only n-gram features SNMLM pruning based on a mutual information criterion yields the best known pruned model on the One Billion Word Language Model Benchmark, reducing perplexity with 18% and 57% over Ka...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Jurnal Informasi dan Teknologi
سال: 2023
ISSN: ['2714-9730']
DOI: https://doi.org/10.60083/jidt.v5i1.385